{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "a6b56b1c7b76"
      },
      "outputs": [],
      "source": [
        "# Copyright 2021 Google LLC\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "976753012196"
      },
      "source": [
        "# Feedback or issues?\n",
        "For any feedback or questions, please open an [issue](https://github.com/googleapis/python-aiplatform/issues)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8c3048cd1427"
      },
      "source": [
        "# Vertex SDK for Python: BigQuery Custom Container Training Example\n",
        "To use this Jupyter notebook, copy the notebook to a Google Cloud Notebooks instance and open it. You can run each step, or cell, and see its results. To run a cell, use Shift+Enter. Jupyter automatically displays the return value of the last line in each cell. For more information about running notebooks in Google Cloud Notebook, see the [Google Cloud Notebook guide](https://cloud.google.com/vertex-ai/docs/general/notebooks).\n",
        "\n",
        "This notebook demonstrate how to create a Custom Model using Custom Container Training and a Big Query Dataset. It will require you provide a bucket where the dataset will be stored.\n",
        "\n",
        "Note: you may incur charges for training, prediction, storage or usage of other GCP products in connection with testing this SDK."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JpYscdzmf4Gu"
      },
      "source": [
        "# Ensure the following APIs are enabled:\n",
        "- [BigQuery](https://console.cloud.google.com/apis/library/bigquery.googleapis.com?q=BigQuery)\n",
        "- [Cloudbuild](https://console.cloud.google.com/apis/library/cloudbuild.googleapis.com?q=Cloudbuild)\n",
        "- [Container Registry](https://console.cloud.google.com/apis/library/containerregistry.googleapis.com?q=container%20registry)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xOMNWzTbftDr"
      },
      "source": [
        "# Install Vertex SDK for Python\n",
        "\n",
        "\n",
        "After the SDK installation the kernel will be automatically restarted."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Be020jY-ftDv"
      },
      "outputs": [],
      "source": [
        "!pip3 uninstall -y google-cloud-aiplatform\n",
        "!pip3 install google-cloud-aiplatform\n",
        "import IPython\n",
        "\n",
        "app = IPython.Application.instance()\n",
        "app.kernel.do_shutdown(True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "181b681faf5c"
      },
      "source": [
        "### Enter your project and GCS bucket\n",
        "\n",
        "Enter your Project Id in the cell below. Then run the cell to make sure the Cloud SDK uses the right project for all the commands in this notebook."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "d61oYG3KftDw"
      },
      "outputs": [],
      "source": [
        "MY_PROJECT = \"YOUR PROJECT ID\"\n",
        "MY_STAGING_BUCKET = \"gs://YOUR BUCKET\"  # bucket should be in same region as ucaip"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5T1d5uBoftDw"
      },
      "source": [
        "# Copy Big Query Iris Dataset\n",
        "We will make a Big Query dataset and copy Big Query's public iris table to that dataset. For more information about this dataset please visit: https://archive.ics.uci.edu/ml/datasets/iris "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DJF047yNftDw"
      },
      "source": [
        "### Make BQ Dataset"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "9yOl-l_oftDx"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "\n",
        "os.environ[\"GOOGLE_CLOUD_PROJECT\"] = MY_PROJECT\n",
        "!bq mk {MY_PROJECT}:ml_datasets"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Xn9TuBZAftDx"
      },
      "source": [
        "### Copy bigquery-public-data.ml_datasets.iris"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ISFR8nFfftDx"
      },
      "outputs": [],
      "source": [
        "!bq cp -n bigquery-public-data:ml_datasets.iris {MY_PROJECT}:ml_datasets.iris"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IltwFqKIftDx"
      },
      "source": [
        "# Create Training Container\n",
        "We will create a directory and write all of our container build artifacts into that folder."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "40BkhtMeftDy"
      },
      "outputs": [],
      "source": [
        "CONTAINER_ARTIFACTS_DIR = \"demo-container-artifacts\"\n",
        "\n",
        "!mkdir {CONTAINER_ARTIFACTS_DIR}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iVeG-LPOftDy"
      },
      "source": [
        "### Create Cloudbuild YAML"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "4kODuFZCftDy"
      },
      "outputs": [],
      "source": [
        "cloudbuild_yaml = \"\"\"steps:\n",
        "- name: 'gcr.io/cloud-builders/docker'\n",
        "  args: [ 'build', '-t', 'gcr.io/{MY_PROJECT}/test-custom-container', '.' ]\n",
        "images: ['gcr.io/{MY_PROJECT}/test-custom-container']\"\"\".format(\n",
        "    MY_PROJECT=MY_PROJECT\n",
        ")\n",
        "\n",
        "with open(f\"{CONTAINER_ARTIFACTS_DIR}/cloudbuild.yaml\", \"w\") as fp:\n",
        "    fp.write(cloudbuild_yaml)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gQ_GUCtZftDz"
      },
      "source": [
        "### Write Dockerfile"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Rja_jo3rftDz"
      },
      "outputs": [],
      "source": [
        "%%writefile {CONTAINER_ARTIFACTS_DIR}/Dockerfile\n",
        "\n",
        "# Specifies base image and tag\n",
        "FROM gcr.io/google-appengine/python\n",
        "WORKDIR /root\n",
        "\n",
        "# Installs additional packages\n",
        "RUN pip3 install tensorflow tensorflow-io pyarrow\n",
        "\n",
        "# Copies the trainer code to the docker image.\n",
        "COPY test_script.py /root/test_script.py\n",
        "\n",
        "# Sets up the entry point to invoke the trainer.\n",
        "ENTRYPOINT [\"python3\", \"test_script.py\"]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9dfrLShaftDz"
      },
      "source": [
        "### Write entrypoint script to invoke trainer"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "S0jSd8NWftDz"
      },
      "outputs": [],
      "source": [
        "%%writefile {CONTAINER_ARTIFACTS_DIR}/test_script.py\n",
        "\n",
        "from tensorflow.python.framework import ops\n",
        "from tensorflow.python.framework import dtypes\n",
        "from tensorflow_io.bigquery import BigQueryClient\n",
        "from tensorflow_io.bigquery import BigQueryReadSession\n",
        "import tensorflow as tf\n",
        "from tensorflow import feature_column\n",
        "import os\n",
        "\n",
        "training_data_uri = os.environ[\"AIP_TRAINING_DATA_URI\"]\n",
        "validation_data_uri = os.environ[\"AIP_VALIDATION_DATA_URI\"]\n",
        "test_data_uri = os.environ[\"AIP_TEST_DATA_URI\"]\n",
        "data_format = os.environ[\"AIP_DATA_FORMAT\"]\n",
        "\n",
        "def caip_uri_to_fields(uri):\n",
        "    uri = uri[5:]\n",
        "    project, dataset, table = uri.split('.')\n",
        "    return project, dataset, table\n",
        "\n",
        "feature_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']\n",
        "\n",
        "target_name = 'species'\n",
        "\n",
        "def transform_row(row_dict):\n",
        "  # Trim all string tensors\n",
        "  trimmed_dict = { column:\n",
        "                  (tf.strings.strip(tensor) if tensor.dtype == 'string' else tensor) \n",
        "                  for (column,tensor) in row_dict.items()\n",
        "                  }\n",
        "  target = trimmed_dict.pop(target_name)\n",
        "\n",
        "  target_float = tf.cond(tf.equal(tf.strings.strip(target), 'versicolor'), \n",
        "                 lambda: tf.constant(1.0),\n",
        "                 lambda: tf.constant(0.0))\n",
        "  return (trimmed_dict, target_float)\n",
        "\n",
        "def read_bigquery(project, dataset, table):\n",
        "  tensorflow_io_bigquery_client = BigQueryClient()\n",
        "  read_session = tensorflow_io_bigquery_client.read_session(\n",
        "      \"projects/\" + project,\n",
        "      project, table, dataset,\n",
        "      feature_names + [target_name],\n",
        "      [dtypes.float64] * 4 + [dtypes.string],\n",
        "      requested_streams=2)\n",
        "\n",
        "  dataset = read_session.parallel_read_rows()\n",
        "  transformed_ds = dataset.map(transform_row)\n",
        "  return transformed_ds\n",
        "\n",
        "BATCH_SIZE = 16\n",
        "\n",
        "training_ds = read_bigquery(*caip_uri_to_fields(training_data_uri)).shuffle(10).batch(BATCH_SIZE)\n",
        "eval_ds = read_bigquery(*caip_uri_to_fields(validation_data_uri)).batch(BATCH_SIZE)\n",
        "test_ds = read_bigquery(*caip_uri_to_fields(test_data_uri)).batch(BATCH_SIZE)\n",
        "\n",
        "feature_columns = []\n",
        "\n",
        "# numeric cols\n",
        "for header in feature_names:\n",
        "  feature_columns.append(feature_column.numeric_column(header))\n",
        "\n",
        "feature_layer = tf.keras.layers.DenseFeatures(feature_columns)\n",
        "\n",
        "Dense = tf.keras.layers.Dense\n",
        "model = tf.keras.Sequential(\n",
        "  [\n",
        "    feature_layer,\n",
        "      Dense(16, activation=tf.nn.relu),\n",
        "      Dense(8, activation=tf.nn.relu),\n",
        "      Dense(4, activation=tf.nn.relu),\n",
        "      Dense(1, activation=tf.nn.sigmoid),\n",
        "  ])\n",
        "\n",
        "# Compile Keras model\n",
        "model.compile(\n",
        "    loss='binary_crossentropy', \n",
        "    metrics=['accuracy'],\n",
        "    optimizer='adam')\n",
        "\n",
        "model.fit(training_ds, epochs=5, validation_data=eval_ds)\n",
        "\n",
        "print(model.evaluate(test_ds))\n",
        "\n",
        "tf.saved_model.save(model, os.environ[\"AIP_MODEL_DIR\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6LYlV4D2ftD0"
      },
      "source": [
        "### Build Container"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "9tGdX7B_ftD1"
      },
      "outputs": [],
      "source": [
        "!gcloud builds submit --config {CONTAINER_ARTIFACTS_DIR}/cloudbuild.yaml {CONTAINER_ARTIFACTS_DIR}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Pf0pugbvftD1"
      },
      "source": [
        "# Run Custom Container Training"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7ee691569d8d"
      },
      "source": [
        "## Initialize Vertex SDK for Python\n",
        "\n",
        "Initialize the *client* for Vertex AI"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "vEEr62NUftD1"
      },
      "outputs": [],
      "source": [
        "from google.cloud import aiplatform\n",
        "\n",
        "aiplatform.init(project=MY_PROJECT, staging_bucket=MY_STAGING_BUCKET)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "736ddff8408b"
      },
      "source": [
        "# Create a Managed Tabular Dataset from Big Query Dataset\n",
        "\n",
        "This section will create a managed Tabular dataset from the iris Big Query table we copied above."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "oBdOv6lWftD1"
      },
      "outputs": [],
      "source": [
        "ds = aiplatform.TabularDataset.create(\n",
        "    display_name=\"bq_iris_dataset\", bq_source=f\"bq://{MY_PROJECT}.ml_datasets.iris\"\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ee242cc1f74c"
      },
      "source": [
        "# Launch a Training Job to Create a Model\n",
        "\n",
        "We will train a model with the container we built above."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "uRGrFdxOftD1"
      },
      "outputs": [],
      "source": [
        "job = aiplatform.CustomContainerTrainingJob(\n",
        "    display_name=\"train-bq-iris\",\n",
        "    container_uri=f\"gcr.io/{MY_PROJECT}/test-custom-container:latest\",\n",
        "    model_serving_container_image_uri=\"gcr.io/cloud-aiplatform/prediction/tf2-cpu.2-2:latest\",\n",
        ")\n",
        "model = job.run(\n",
        "    ds,\n",
        "    replica_count=1,\n",
        "    model_display_name=\"bq-iris-model\",\n",
        "    bigquery_destination=f\"bq://{MY_PROJECT}\",\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "a7fa9b59f919"
      },
      "source": [
        "# Deploy Your Model\n",
        "\n",
        "Deploy your model, then wait until the model FINISHES deployment before proceeding to prediction."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "tEg2IDwPftD2"
      },
      "outputs": [],
      "source": [
        "endpoint = model.deploy(machine_type=\"n1-standard-4\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4dbd6c650a03"
      },
      "source": [
        "# Predict on the Endpoint"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "DKVhGB1PftD2"
      },
      "outputs": [],
      "source": [
        "endpoint.predict(\n",
        "    [{\"sepal_length\": 5.1, \"sepal_width\": 2.5, \"petal_length\": 3.0, \"petal_width\": 1.1}]\n",
        ")"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "AI_Platform_(Unified)_SDK_BigQuery_Custom_Container_Training.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
